# Codebook: The Neighborhood Hangover Dataset

## File: brewery_closures.csv

Primary dataset of 15,936 brewery closures identified through version control archaeology.

| Variable | Type | Description | Values/Range | Missing |
|----------|------|-------------|--------------|---------|
| `brewery_id` | string | Unique brewery identifier from OBDB | UUID format | 0 |
| `name` | string | Brewery name | Text | 12 |
| `street` | string | Street address | Text | 1,247 |
| `city` | string | City name | Text | 0 |
| `state` | string | State name or abbreviation | 50 states + DC | 0 |
| `postal_code` | string | ZIP code | 5-digit | 89 |
| `latitude` | float | Latitude coordinate | 24.5 to 71.3 | 156 |
| `longitude` | float | Longitude coordinate | -179.1 to -66.9 | 156 |
| `closure_date` | date | Date brewery was marked closed | 2020-10-01 to 2025-12-31 | 0 |
| `closure_year` | integer | Year of closure | 2020-2025 | 0 |
| `fips_county` | string | 5-digit FIPS county code | 01001-56045 | 203 |
| `closure_source` | string | How closure was identified | "deletion", "status_change" | 0 |

### Notes on brewery_closures.csv

- **closure_date**: Derived from Git commit timestamp when brewery was removed or status changed
- **latitude/longitude**: Missing values indicate addresses that could not be geocoded
- **fips_county**: Assigned via spatial join with Census county boundaries

---

## File: panel_analysis_data.csv

Quarterly panel dataset for difference-in-differences analysis. Unit of observation is walkshed × quarter.

| Variable | Type | Description | Values/Range | Missing |
|----------|------|-------------|--------------|---------|
| `walkshed_id` | string | Unique walkshed identifier | W_00001 to W_01530 | 0 |
| `quarter` | string | Calendar quarter | 2018Q1 to 2025Q4 | 0 |
| `treatment` | integer | Treatment group indicator | 0 = control, 1 = treatment | 0 |
| `post_period` | integer | Post-treatment period | 0 = pre, 1 = post | 0 |
| `outcome` | float | Normalized vacancy rate | 0.0 to 35.2 | 0 |
| `zhvi_logchange` | float | Log change in ZHVI from Q4 2019 | -0.89 to 0.67 | 127 |
| `closure_count` | integer | Number of closures in walkshed | 0 to 23 | 0 |
| `cluster_type` | string | Agglomeration classification | "isolated", "clustered" | 0 |

### Variable Definitions

**treatment**: Binary indicator for high-closure walksheds
- 1 = Above-median closures per capita (typically ≥3 closures)
- 0 = Below-median closures (0-2 closures, brewery retained)

**post_period**: Binary indicator for post-treatment quarters
- 1 = Quarters after median closure date in walkshed
- 0 = Quarters before median closure date

**outcome**: USPS vacancy rate, normalized
- Calculated as: (vacant addresses / total addresses) × 100
- Higher values = more vacant addresses = more neighborhood distress

**zhvi_logchange**: Property value trajectory
- Calculated as: log(ZHVI_t) - log(ZHVI_Q42019)
- Negative values = price decline relative to baseline

**cluster_type**: Spatial agglomeration context
- "clustered" = Brewery within 15-min walk of ≥2 other breweries at time of closure
- "isolated" = Brewery was only establishment in walkshed

---

## File: closures_by_state.csv

State-level aggregation of closure counts.

| Variable | Type | Description | Values/Range | Missing |
|----------|------|-------------|--------------|---------|
| `state` | string | State name | 50 states + DC | 0 |
| `total_closures` | integer | Total closures 2020-2025 | 12 to 1,847 | 0 |
| `pct_of_total` | float | Percentage of national closures | 0.08 to 11.59 | 0 |

---

## File: did_results.csv

Difference-in-differences regression output (Table 1 in paper).

| Variable | Type | Description |
|----------|------|-------------|
| `variable` | string | Regression term name |
| `coefficient` | float | Point estimate |
| `std_error` | float | Robust standard error |
| `t_stat` | float | t-statistic |
| `p_value` | float | p-value (two-tailed) |
| `ci_lower` | float | 95% CI lower bound |
| `ci_upper` | float | 95% CI upper bound |

### Rows

1. `Constant` - Intercept
2. `Treatment` - Treatment group baseline difference
3. `Post Period` - Time trend for control group
4. `Treatment × Post` - **DiD estimator** (causal effect)

---

## File: event_study_results.csv

Event study coefficients (Figure 4 in paper).

| Variable | Type | Description | Values/Range |
|----------|------|-------------|--------------|
| `time` | integer | Years relative to treatment | -3, -2, 0, 1, 2 |
| `coef` | float | Point estimate | -0.08 to 1.09 |
| `ci_lower` | float | 95% CI lower bound | -0.21 to 0.89 |
| `ci_upper` | float | 95% CI upper bound | 0.05 to 1.29 |

### Notes

- `time = -1` is the omitted reference period (coefficient normalized to 0)
- Pre-treatment coefficients (`time < 0`) test parallel trends assumption
- Post-treatment coefficients (`time ≥ 0`) show dynamic treatment effects

---

## Constructed Variables

### Walkshed Construction

Walksheds are 15-minute network isochrones computed via OSMnx:

```
walkshed_i = {all street network nodes reachable within 15 minutes
              walking at 4.5 km/h from brewery_i location}
```

Walkshed boundaries converted to polygons for spatial joins with Census geographies.

### Treatment Assignment

```
treatment_i = 1  if  (closures_i / population_i) > median(closures/population)
            = 0  otherwise
```

Where `closures_i` is the count of brewery closures within walkshed `i` during the study period.

### Cluster Classification

```
cluster_type_i = "clustered"  if  count(breweries within 15-min walk) >= 2
               = "isolated"   otherwise
```

Evaluated at the time of closure for each brewery.

---

## Data Quality Notes

1. **Geocoding failures**: 156 breweries (1.0%) could not be geocoded due to incomplete addresses. These are excluded from spatial analysis but retained in aggregate counts.

2. **ZHVI coverage**: 127 walkshed-quarters (8.3%) have missing ZHVI data due to insufficient sales activity in rural ZIP codes. DiD analysis on ZHVI uses listwise deletion.

3. **Temporal precision**: Closure dates reflect Git commit timestamps, which may lag actual closure by days to weeks depending on OBDB contributor activity.

4. **Walkshed overlap**: Some walksheds overlap geographically. Overlapping areas are assigned to the walkshed of the nearest brewery (Euclidean distance).

---

## Software and Versions

| Package | Version | Purpose |
|---------|---------|---------|
| Python | 3.11.4 | Runtime |
| pandas | 2.0.3 | Data manipulation |
| numpy | 1.24.3 | Numerical operations |
| statsmodels | 0.14.0 | Regression analysis |
| geopandas | 0.14.0 | Spatial operations |
| osmnx | 1.6.0 | Network analysis |
| scipy | 1.11.1 | Statistical functions |

---

## Contact

For questions about this dataset, contact:

**Grant Glass**
North Carolina State University
gglass@ncsu.edu
